Dictionary Optimization for Block-Sparse 

Representations 

Kevin Rosenblum, Lihi Zelnik-Manor, Yonina C. Eldar 



o 



en 

H 

l-H 
O 



> 

O 
(N 

p 

in 

o 
o 



X 



Abstract — Recent work has demonstrated that using a carefully 
designed dictionary instead of a predefined one, can improve 
the sparsity in jointly representing a class of signals. This has 
motivated the derivation of learning methods for designing a 
dictionary which leads to the sparsest representation for a given 
set of signals. In some applications, the signals of interest can 
have further structure, so that they can be well approximated by 
a union of a small number of subspaces (e.g., face recognition and 
motion segmentation). This implies the existence of a dictionary 
which enables block-sparse representations of the input signals 
once its atoms are properly sorted into blocks. In this paper, we 
propose an algorithm for learning a block-sparsifying dictionary 
of a given set of signals. We do not require prior knowledge on the 
association of signals into groups (subspaces). Instead, we develop 
a method that automatically detects the underlying block struc- 
ture. This is achieved by iteratively alternating between updating 
the block structure of the dictionary and updating the dictionary 
atoms to better fit the data. Our experiments show that for block- 
sparse data the proposed algorithm significantly improves the 
dictionary recovery ability and lowers the representation error 
compared to dictionary learning methods that do not employ 
block structure. 



I. Introduction 

The framework of sparse coding aims at recovering an 
unknown vector 9 G R^ from an under-determined system of 
linear equations x = DO, where D G R^^^ is a dictionary, 
and X G R^ is an observation vector with N < K. Since the 
system is under-determined, 6 can not be recovered without 
additional information. The framework of compressed sensing 
lH], lIU exploits sparsity of 6 in order to enable recovery. 
Specifically, when 9 is known to be sparse so that it contains 
few nonzero coefficients, and when D is chosen properly, 
then 6 can be recovered uniquely from x ~ DO. Recovery is 
possible irrespectively of the locations of the nonzero entries 
of 9. This result has given rise to a multitude of different 
recovery algorithms. Most prominent among them are Basis 
Pursuit (BP) im, im and Orthogonal Matching Pursuit (OMP) 

a, HI. 

Recent work ||6|, Q, JU, ID, HOl has demonstrated that 
adapting the dictionary Z? to fit a given set of signal ex- 
amples leads to improved signal reconstruction. At the price 
of being slow, these learning algorithms attempt to find a 
dictionary that leads to optimal sparse representations for 
a certain class of signals. These methods show impressive 
results for representations with arbitrary sparsity structures. 
In some applications, however, the representations have a 
unique sparsity structure that can be exploited. Our interest 
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is in the case of signals that are known to be drawn from 
a union of a small number of subspaces ifTTl . lfT2l . This 
occurs naturally, for example, in face recognition |[T3l . lfT4l . 
motion segmentation ifTsl . multiband signals lfT6l . ifTTl . ifTsl . 
measurements of gene expression levels flOl, and more. For 
such signals, sorting the dictionary atoms according to the 
underlying subspaces leads to sparse representations which 
exhibit a block-sparse structure, i.e., the nonzero coefficients 
occur in clusters of varying sizes. Several methods, such as 
Block BP (BBP) Ol, iaOl, Ell and Block OMP (BOMP) l22l . 
||23] | have been proposed to take advantage of this structure in 
recovering the block-sparse representation 6. These methods 
typically assume that the dictionary is predetermined and the 
block structure is known. 

In this paper we propose a method for designing a block- 
sparsifying dictionary for a given set of signals. In other 
words, we wish to find a dictionary that provides block- 
sparse representations best suited to the signals in a given 
set. To take advantage of the block structure via block-sparse 
approximation methods, it is necessary to know the block 
structure of the dictionary. We do not assume that it is known 
a-priori. Instead, we infer the block structure from the data 
while adapting the dictionary. 

We start by formulating this task as an optimization prob- 
lem. We then present an algorithm for minimizing the pro- 
posed objective, which iteratively alternates between updating 
the block structure and updating the dictionary. The block 
structure is inferred by the agglomerative clustering of dic- 
tionary atoms that induce similar sparsity patterns. In other 
words, after finding the sparse representations of the training 
signals, the atoms are progressively merged according to the 
similarity of the sets of signals they represent. A variety of 
segmentation methods through subspace modeling have been 
proposed recently 1241 . Il25l . Il26l . These techniques learn an 
underlying collection of subspaces based on the assumption 
that each of the samples lies close to one of them. However, 
unlike our method, they do not treat the more general case 
where the signals are drawn from a union of several subspaces. 

The dictionary blocks are then sequentially updated to 
minimize the representation error at each step. The proposed 
algorithm is an intuitive extension of the K-SVD algorithm |]6l, 
which yields sparsifying dictionaries by sequentially updating 
the dictionary atoms, to the case of block structures. In other 
words, when the blocks are of size 1 our cost function and 
the algorithm we propose reduce to K-SVD. Our experiments 
show that updating the dictionary block by block is preferred 
over updating the atoms in the dictionary one by one, as in 
K-SVD. 

We show empirically that both parts of the algorithm 



are indispensable to obtain high performance. While fixing 
a random block structure and applying only the dictionary 
update part leads to improved signal reconstruction compared 
to K-SVD, combining the two parts leads to even better results. 
Furthermore, our experiments show that K-SVD often fails to 
recover the underlying block structure. This is in contrast to 
our algorithm which succeeds in detecting most of the blocks. 

We begin by reviewing previous work on dictionary design 
in Section |II] In Section IIII-AI we present an objective for 
designing block-sparsifying dictionaries. We show that this 
objective is a direct extension of the one used by K-SVD. We 
then propose an algorithm for minimizing the proposed cost 
function (Section IIII-Bl i. In Section IIII-CI we give a detailed 
description of the algorithm for finding a block structure 
and in Section IIII-DI we describe the dictionary update part. 
We evaluate the performance of the proposed algorithms and 
compare them to previous work in Section |IV] 

Throughout the paper, we denote vectors by lowercase 
letters, e.g., x, and matrices by uppercase letters, e.g., A. The 
jth column of a matrix A is written as Aj, and the ith row 
as A*. The sub-matrix containing the entries of A in the rows 
with indices r and the columns with indices c is denoted Al,. 
The Frobenius norm is defined by \\A\\p = a /X^i 



ith element of a vector x is denoted x[i\ 

and l|a;||o counts the number of non-zero entries in x. 



IP 



A, Hi. The 
is its L-norm 



II. Prior work on dictionary design 

The goal in dictionary learning is to find a dictionary D and 
a representation matrix 9 that best match a given set of vectors 
Xi that are the columns of X. In addition, we would like each 
vector <di of <d to be sparse. In this section we briefly review 
two popular sparsifying dictionary design algorithms, K-SVD 
im, lETl and MOD (Method of Optimal Directions) Q. We 
will generalize these methods to block-sparsifying dictionary 
design in Section |lll] 

To learn an optimal dictionary, both MOD and K-SVD 
attempt to optimize the same cost function for a given sparsity 
measure k: 



min \\X-DQ\\f 

DM 



S.t. 



||©ilIo < fc, i = l,...,i 



(1) 



approaches are strongly dependent on the initial dictionary 
D'^^y The convention is to initialize D^^^ as a collection of 
K data signals from the same class as the training signals X. 
The first step of the nth iteration in both algorithms opti- 
mizes Q given a fixed dictionary D^"^^', so that ^ becomes: 

\\X - D^''-^^Q\\f 

||e,||o<fc, z = i,...,i. (2) 



G'"' = argmin 
e 



s.t. 



This problem can be solved approximately using sparse coding 
methods such as BP or OMP for each column of Q, since the 
problem is separable in these columns. Next, O*-"' is kept fixed 
and the representation error is minimized over D: 

£)("' =argmiii||Ji:-i:)e(")||F- 



argmin \\X 



(3) 



The difference between MOD and K-SVD lies in the choice of 
optimization method for I?'"^ While K-SVD converges faster 
than MOD, both methods yield similar results. 

The MOD algorithm treats the problem in (O directly. This 
problem has a closed form solution given by the pseudo- 
inverse: 

£)(") = xe'(")(e(")e'("))"^ (4) 

Here we assume for simplicity that 0(")9'^"-' is invertible. 
The K-SVD method solves (O differently. The columns in 
£)("-!) are updated sequentially, along with the corresponding 
non-zero coefficients in 8^"-*. This parallel update leads to 
a significant speedup while preserving the sparsity pattern 
of 6'"^ For j = 1,...,K, the update is as follows. Let 
Ljj = {i <E 1, . . . ,L\Qj y^ 0} be the set of indices correspond- 
ing to columns in 0'"' that use the atom Dj, i.e., their ith 
row is non-zero. Denote by R^^ ^ X^. - J2i^jiDi^ljj) the 
representation error of the signals X^^ excluding the contri- 
bution of the jth atom. The representation error of the signals 
with indices ujj can then be written as \\Ru:- — DjQl^. \\p. The 
goal of the update step is to minimize this representation error, 
which is accomplished by choosing 



D, 



L/i, ei, = A}T^. 



Here UAV' is the Singular Value Decomposition (SVD) of 
Ri^ . Note, that the columns of D remain normalized after the 
update. The K-SVD algorithm obtains the dictionary update 
by K separate SVD computations, which explains its name. 



where X E R ^ is a matrix containing L given input 

signals, D e R^^^ is the dictionary and 9 £ i?^^^ is a III. Block-Sparsifying DICTIONARY OPTIMIZATION 
sparse representation of the signals. Note that the solution of 
([TJ is never unique due to the invariance of D to permutation 
and scaling of columns. This is partially resolved by requiring 
normalized columns in D. We will therefore assume through- 
out the paper that the columns of D are normalized to have 
Z2-norm equal 1. 

Problem ([U is non-convex and NP-hard in general. Both 
MOD and K-SVD attempt to approximate ([T]i using a relax- 
ation technique which iteratively fixes all the parameters but 
one, and optimizes the objective over the remaining variable. 
In this approach the objective decreases (or is left unchanged) 
at each step, so that convergence to a local minimum is 
guaranteed. Since this might not be the global optimum both 



We now formulate the problem of block-sparsifying dic- 
tionary design. We then propose an algorithm which can be 
seen as a natural extension of K-SVD for the case of signals 
with block sparse representations. Our method involves an 
additional clustering step in order to determine the block 
structure. 

A. Problem definition 

For a given set of L signals X = {Xi]l^^ £ R^ , we wish 
to find a dictionary D S R^^^ whose atoms are sorted in 
blocks, and which provides the most accurate representation 
vectors whose non-zero values are concentrated in a fixed 



|1|1|1|1|2|2|2|2|3|3|4|4|4|4|5|T1 
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|1|2|5|2|1|4|5|4|4|3|1|2|1|3| 4TH 



Fig. 1. Two equivalent examples of dictionaries D and block structures d 
with 5 blocks, together with 2-block-sparse representations 9. Both examples 
represent the same signal, since the atoms in D and the entries of d and 6 
are permuted in the same manner. 



number of blocks. In previous works dealing with the block- 
sparse model, it is typically assumed that the block structure 
in D is known a-priori, and even more specifically, that the 
atoms in D are sorted according to blocks ifTTI . 1201 . Instead, 
in this paper we address the more general case where the block 
structure is unknown and the blocks can be of varying sizes. 
The only assumption we make on the block structure is that 
the maximal block size, denoted by s, is known. 

More specifically, suppose we have a dictionary whose 
atoms are sorted in blocks that enable block-sparse representa- 
tions of the input signals. Assume that each block is given an 
index number. Let d S R^ be the vector of block assignments 
for the atoms of D, i.e., d[i] is the block index of the atom 
Di. We say that a vector 6 G R^ is fc-block-sparse over d 
if its non-zero values are concentrated in k blocks only. This 
is denoted by ||6'||o,d = k, where ||0||o,d is the ^o-norm over 
d and counts the number of non-zero blocks as defined by d. 
Fig. [T] presents examples of two different block structures and 
two corresponding block-sparse vectors and dictionaries. 

Our goal is to find a dictionary D and a block structure d, 
with maximal block size s, that lead to optimal fc-block sparse 
representations O = {QiJiLi for the signals in X: 



mm 

D.d,0 



s.t. 



\\X -DQWf 

||e,||o,d<fc, 1^1,. 

\dj\ < s, j ed 



,L 



(5) 



where dj — {i ^ 1, . . . ,K\d[i] = j} is the set of indices 
belonging to block j (i.e., the list of atoms in block j). 

The case when there is no underlying block structure or 
when the block structure is ignored, is equivalent to setting s = 
1 and d = [1, . . . ,K]. Substituting this into (|5]l, reduces it to 
([U. In this setting, the objective and the algorithm we propose 
coincide with K-SVD. In Section HV] we demonstrate through 
simulations that when an underlying block structure exists, 
optimizing (|5]l via the proposed framework improves recovery 
results and lowers the representation errors with respect to ([T]i- 



collection of K signals leads to similar results, but slightly 
slower convergence). Then, at each iteration n we perform 
the following two steps; 

1) Recover the block structure by solving ^ for d and 6 
while keeping L)("~i) fixed: 

[x-L»(«-i)e||F (6) 



d,Q 
S.t. 



||6ij|o,d < k, i-^ 
\dj\ <s, j ed. 



1,- 



,L 



An exact solution would require a combinatorial search 
over all feasible d and 9. Instead, we propose a tractable 
approximation to (|6]l in Section IIII-CI referred to as 
Sparse Agglomerative Clustering (SAC). Agglomerative 
clustering builds blocks by progressively merging the 
closest atoms according to some distance metric 1281 . 
[|29!|. SAC uses the Zo-norm for this purpose. 
2) Fit the dictionary I?'") to the data by solving (|5]l for D 
and 8 while keeping d^") fixed: 



[£)(«) ^e(")] 



mm 



s.t. 



\X 



|0.l 



-DQ\\f 
o,d(") < k, 



(7) 



In Section IIII-DI we propose an algorithm, referred to as 
Block K-SVD (BK-SVD), for solving ^. This technique 
can be viewed as a generalization of K-SVD since the 
blocks in D'^"' are sequentially updated together with 
the corresponding non-zero blocks in 0^"\ 
In the following sections we describe in detail the steps 

of this algorithm. The overall framework is summarized in 

Algorithm 1. 

Algorithm 1 Block-Sparse Dictionary Design 
Input: A set of signals X, block sparsity k and maximal block 
size s. 

Task: Find a dictionary D, block structure d and the corre- 
sponding sparse representation 9 by optimizing: 



mm 
D,d,e 

s.t. 



||^--D9i|i=^ 

l|9,||o,d<fc, i = l,...,L 
\dj\ < s, j £ d. 

Initialization: Set the initial dictionary Z?'^") as the outcome of 

K-SVD. 

Repeat from ?i = 1 until convergence: 

1) Fix £)("-!), and update d(") and 9^") by applying 
Sparse Agglomerative Clustering. 

2) Fix d("), and update £»(") and 9(") by applying BK- 
SVD. 

3) n = n + 1. 



B. Algorithm Preview 

In this section, we propose a framework for solving (|5]l. 
Since this optimization problem is non-convex, we adopt the 
coordinate relaxation technique. We initialize the dictionary 
_D'"' as the outcome of the K-SVD algorithm (using a random 



C. Block Structure Recovery: Sparse Agglomerative Cluster- 
ing 

In this section we propose a method for recovering the block 
structure d given a fixed dictionary D, as outlined in Fig. |2(a)| 
The suggested method is based on the coordinate relaxation 



technique to solve (|6]l efficiently. We start by initializing d and 
O. Since we have no prior knowledge on d it is initialized as 
K blocks of size 1, i.e. d = [1, . . . , A']. To initialize 6 we 
keep d fixed and solve ^ over 8 using OMP with k x s 
instead of k non-zero entries, since the signals are known to 
be combinations of k blocks of size s. Based on the obtained 
0, we first update d as described below and then again Q 
using BOMP f22]. The BOMP algorithm sequentially selects 
the dictionary blocks that best match the input signals Xi, and 
can be seen as a generalization of the OMP algorithm to the 
case of blocks. 

To update d we wish to solve ^ while keeping fixed. 
Although the objective does not depend on d, the constraints 
do. Therefore, the problem becomes finding a block structure 
with maximal block size s that meets the constraint on the 
block-sparsity of 0. To this end, we seek to minimize the 
block-sparsity of Q over d: 



nin^lie 



iWO.d 



s.t. \dj\ < s, j G d. 



(8) 



Before we describe how (|8]l is optimized we first wish to 
provide some insight. When a signal Xi is well represented 
by the unknown block dj, then the corresponding rows in Qi 
are likely to be non-zero. Therefore, rows of Q that exhibit 
a similar pattern of non-zeros are likely to correspond to 
columns of the same dictionary block. Therefore, grouping 
dictionary columns into blocks is equivalent to grouping rows 
of Q according to their sparsity pattern. To detect rows with 
similar sparsity patterns we next rewrite the objective of ^ 
as a function of the pattern on non-zeros. 

Let ujj{&,d) denote the list of columns in Q that have 
non-zero values in rows corresponding to block dj, i.e., 
LUj{e,d) = {i e l,...,i| ||e,''^||2 > 0}. Problem © can 



now be rewritten as: 



mm^|wj(e,d)| 
jed 



s.t. \dj\ < s, j & d 



(9) 



where \ujj \ denotes the size of the list ujj. We propose using a 
sub-optimal tractable agglomerative clustering algorithm ||29l 
to minimize this objective. At each step we merge the pair 
of blocks that have the most similar pattern of non-zeros in 
Q, leading to the steepest descent in the objective. We allow 
merging blocks as long as the maximum block size s is not 
exceeded. 

More specifically, at each step we find the pair of blocks 
{Jl,J2) such that: 



[J1,J2] 



arg max \ojj-^ D ojj., 
01 #i2 



s.t. |djj + \dj.,_\ < s. 



We then merge jl and jj by setting Vi G dj^ : d[i] •<— ji, 
oijj ^— {wjj U Wjj}, and loj.^ <— 0. This is repeated until 
no blocks can be merged without breaking the constraint 
on the block size. We do not limit the intersection size 
for merging blocks from below, since merging is always 
beneficial. Merging blocks that have nothing in common may 
not reduce the objective of ([S]); however, this can still lower 
the representation error at the next BK-SVD iteration. Indeed, 
while the number of blocks k stays fixed, the number of atoms 



that can be used to reduce the error increases. 

Fig. |2(b)| presents an example that illustrates the notation 
and the steps of the algorithm. In this example the maximal 
block size is s = 2. At initialization the block structure is set 
to d ~ [1,2,3,4], which implies that the objective of ^ is 



E- 



19 



illO.d 



= 2 + 1 + 2 + 2 = 7. At the first iteration, uji 



and CJ3 have the largest intersection. Consequently, blocks 1 
and 3 are merged. At the second iteration, a;2 and bj^ have the 
largest intersection, so that blocks 2 and 4 are merged. This 
results in the block structure rf = [1,2,1,2] where no blocks 
can be merged without surpassing the maximal block size. 
The objective of ^ is reduced to X]j=i l|Qi||o,rf = 4, since 
all 4 columns in Q are 1-block-sparse. Note that since every 
column contains non-zero values, this is the global minimum 
and therefore the algorithm succeeded in solving (H)). 

While more time-efficient clustering methods exist, we have 
selected agglomerative clustering because it provides a simple 
and intuitive solution to our problem. Partitional clustering 
methods, such as K-Means, require initialization and are 
therefore not suited for highly sparse data and the ^o-norm 
metric. Moreover, since oversized blocks are unwanted, it is 
preferable to limit the block size rather than the number of 
blocks. It is important to note that due to the iterative nature 
of our dictionary design algorithm, clustering errors can be 
corrected in the following iteration, after the dictionary has 
been refined. 



D. Block K-SVD Algorithm 

We now propose the BK-SVD algorithm for recovering the 
dictionary D and the representations Q by optimizing ^ given 
a block structure d and input signals X. 

Using the coordinate relaxation technique, we solve this 
problem by minimizing the objective based on alternating O 
and D. At each iteration ?n, we first fix £)('"" 1) and use 
BOMP to solve (HI) which reduces to 

e(")=argmin \\X ~ D^'^^^^Qy 

s.t. ||e,||o,rf<fc, i-i,...,i. (10) 

Next, to obtain D'-'"^^ we fix 6(™\(i and X, and solve: 

£>("') ^ argmin \\X - ZJe^^'HF- (H) 

Inspired by the K-SVD algorithm, the blocks in £)(™~i) are 
updated sequentially, along with the corresponding non-zero 
coefficients in 8*^™^ For every block j £ d, the update is as 
follows. Denote by R^^ = X^^ ^^i^j ^d.^t) *^ represen- 
tation error of the signals X^ excluding the contribution of 
the jth block. Here ojj and dj are defined as in the previous 
subsection. The representation error of the signals with indices 
ujj can then be written as ||i?cj — Dd-QJ^Wp- Finally, the 
representation error is minimized by setting Dd QJ^ equal to 
the matrix of rank \dj\ that best approximates i?-^ . This can 
obtained by the following updates: 
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Fig. 2. (a) A flow chart describing the SAC algorithm, (b) A detailed example 
of the decision making process in the SAC algorithm. 



where the \dj\ highest rank components of R^. are computed 
using the SVD R^. = UAV'. The updated Dd is now an 
orthonormal basis that optimally represents the signals with 
indices ujj. Note that the representation error is also minimized 
when multiplying Dd on the right by W and QJ^ on the 
left by W^^, where W G Rl'^il^l'^il is an invertible matrix. 
However, if we require the dictionary blocks to be orthonormal 
subspaces, the solution is unique up to the permutation of 
the atoms. It is also important to note that if \dj\ > \ujj\, 
then \dj\~ \ujj \ superfluous atoms in block j can be discarded 
without any loss of performance. 

This dictionary update minimizes the representation error 
while preserving the sparsity pattern of 9^™\ as in the K-SVD 
dictionary update step. However, the update step in the BK- 
SVD algorithm converges faster thanks to the simultaneous 
optimization of the atoms belonging to the same block. Our 
simulations show that it leads to smaller representation errors 
as well. Moreover, the dictionary update step in BK-SVD 
requires about s times less SVD computations, which makes 
the proposed algorithm significantly faster than K-SVD. 

We next present a simple example illustrating the advantage 
of the BK-SVD dictionary update step, compared to the K- 
SVD update. Let Di and D2 be the atoms of the same 
block, of size 2. A possible scenario is that D2 = Ui 
and 92^ = -A(1,1)F/. In K-SVD, the first update of 
D is Di ^ Ui and Ql^ ^ A(l,l)yi'. In this case the 
second update would leave D2 and 9^ unchanged. As a 
consequence, only the highest rank component of R^. is 
removed. Conversely, in the proposed BK-SVD algorithm, the 
atoms Di and D2 are updated simultaneously, resulting in the 
two highest rank components of R^^ being removed. 

IV. Experiments 

In this section, we evaluate the contribution of the proposed 
block- sparsifying dictionary design framework empirically. 
We also examine the performance of the SAC and the BK- 
SVD algorithms separately. 

For each simulation, we repeat the following procedure 50 
times: We randomly generate a dictionary D* of dimension 
30 X 60 with normally distributed entries and normalize its 
columns. The block structure is chosen to be of the form: 

d* = [1,1, 1,2, 2, 2,..., 20, 20, 20] 

i.e. D* consists of 20 subspaces of size s ~ 3. We generate 
L = 5000 test signals X of dimension N = 30, that 
have 2-block sparse representations 9* with respect to D* 
(i.e. k = 2). The generating blocks are chosen randomly 
and independently and the coefficients are i.i.d. uniformly 
distributed. White Gaussian noise with varying SNR was 
added to X. 

We perform three experiments: 

1) Given D* and X, we examine the ability of SAC to 
recover d*. 

2) Given d* and X, we examine the ability of BK-SVD to 
recover D*. 

3) We examine the ability of BK-SVD combined wifli SAC 
to recover D* and d* given only X. 



— >. 
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- - - oracle 
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Fig. 3. Simulation results of the SAC algorithm. The graphs show e, p and 
b as a function of the SNR of the data signals for k = 2 {a, b, c), and as a 
function of k in a noiseless setting (d, e, f). 



We use two measures to evaluate the success of the simu- 
lations based on their outputs D, d and <d: 

• The normalized representation error e = ^^ rYl! • 

• The percentage p of successfully recovered blocks. For 
every block in D, we match the closest block in D* with- 
out repetition, where the (normalized) distance between 
two blocks Si and 5*2 (of sizes si and S2) is measured 
by: 



Dist(5i,52) = 



1- 



\S1S2 



max(si,S2) 



assuming that both blocks are orthonormalized. If the 
distance between the block in D and its matched block 
in D* is smaller than 0.01, we consider the recovery of 
this block as successful. 

A. Evaluating SAC 

To evaluate the performance of the SAC algorithm, we 
assume that D* is known, and use SAC to reconstruct d* 
and then BOMP to approximate 8*. The SAC algorithm is 
evaluated as a function of the SNR of the signals X for fc = 2, 
and as a function of fc in a noiseless setting. In addition to 
e and p. Fig. [3] also shows the objective of (O, which we 
denote by b. We compare our results with those of an "oracle" 
algorithm, which is given as input the true block structure d* . 
It then uses BOMP to find 8. The oracle's results provide a 
lower bound on the reconstruction error of our algorithm (we 
cannot expect our algorithm to outperform the oracle). It can 
be seen that for SNR higher than — 5[dB], the percentage p of 
successfully recovered blocks quickly increases to 100% (Fig. 
[3](b)), the representation error e drops to zero (Fig. [3](a)) and 
the block-sparsity b drops to the lowest possible value fc = 2 
(Fig. [3](c)). Fig. [3](e) shows that the block structure d* is 
perfectly recovered for fc < 6. However, for fc = 6, SAC fails 
in reconstructing the block structure d* , even though the block 
sparsity h reaches the lowest possible value (Fig.[3](f)). This is 
a consequence of the inability of OMP to recover the sparsest 
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Fig. 4. Simulation results of the BK-SVD and (B)K-SVD algorithms. The 
graphs show the reconstruction error e and the recovery percentage p as a 
function of the SNR of the data signals for fc = 2 and after 250 iterations (a, 
b), as a function of the number of iterations for fc = 2 in a noiseless setting 
(c, d), and as a function of k in a noiseless setting after 250 iterations (e, f). 



approximation of the signals X with k x s — 12 nonzero 
entries. In terms of e and b, our algorithm performs nearly as 
good as the oracle. 

B. Evaluating BK-SVD 

To evaluate the performance of the BK-SVD algorithm we 
assume that the block structure d* is known. We initialize 
the dictionary D*^°-' by generating 20 blocks of size 3 where 
each block is a randomly generated linear combination of 
2 randomly selected blocks of D*. We then evaluate the 
contribution of the proposed BK-SVD algorithm. Recall that 
dictionary design consists of iterations between two steps, 
updating 8 using block-sparse approximation and updating the 
blocks in D and their corresponding non-zero representation 
coefficients. To evaluate the contribution of the latter step, 
we compare its performance with that of applying the same 
scheme, but using the K-SVD dictionary update step. We refer 
to this algorithm as (B)K-SVD. The algorithms are evaluated 
as a function of the SNR of the signals X for fc = 2 after 
250 iterations, as a function of the number of iterations for 
fc = 2 in a noiseless setting, and as a function of fc in a 
noiseless setting after 250 iterations. It is clear from Fig. |4] 
that the simultaneous update of the atoms in the blocks of 
D is imperative and does not only serve as a speedup of the 
algorithm. 

C. Evaluating the overall framework 

To evaluate the performance of the overall block-sparsifying 
dictionary design method, we combine SAC and BK-SVD. At 
each iteration we only run BK-SVD once instead of waiting 
for it to converge, improving the ability of the SAC algorithm 
to avoid traps. Our results are compared with those of K-SVD 
(with a fixed number of 8 coefficients) and with those of BK- 
SVD (with a fixed block structure) as a function of the SNR, 
as a function of the number of iterations. The algorithms are 




Fig. 5. Simulation results of our overall algorithm (BK-SVD+SAC). the BK- 
SVD algorithm and the K-SVD algorithm. The graphs show the reconstruction 
en'or e and the recovery percentage p as a function of the SNR of the data 
signals for fc = 2 after 250 iterations (a, b), as a function of the number of 
iterations for fc = 2 in a noiseless setting (c, d), and as a function of fc in a 
noiseless setting after 250 iterations (e, f). 



evaluated as a function of the SNR of the signals X for k ~ 2 
after 250 iterations, as a function of the number of iterations 
for fc = 2 in a noiseless setting, and as a function of fc in a 
noiseless setting after 250 iterations (Fig. |5]l. 

Our experiments show that for SNR > 10[dB], the proposed 
block-sparsifying dictionary design algorithm yields lower 
reconstruction errors (see Fig. |5](a)) and a higher percentage 
of correctly reconstructed blocks (see Fig. |5](b)), compared 
to K-SVD. Moreover, even in a noiseless setting, the K-SVD 
algorithm fails to recover the sparsifying dictionary, while our 
algorithm succeeds in recovering 93% of the dictionary blocks, 
as shown in Fig. |5](d). 

For SNR < 10[dB] we observe that K-SVD reaches lower 
reconstruction error compared to our block-sparsifying dictio- 
nary design algorithm. This is since when the SNR is low the 
block structure is no longer present in the data and the use of 
block-sparse approximation algorithms is unjustified. To verify 
this is indeed the cause for the failure of our algorithm, we 
further compare our results with those of an oracle algorithm, 
which is given as input the true dictionary D* and block 
structure d*. It then uses BOMP to find 9. Fig. |5] shows that 
for all noise levels, our algorithm performs nearly as good 
as the oracle. Furthermore, for SNR < 10[dB] we observe 
that K-SVD outperforms the oracle, implying that the use of 
block-sparsifying dictionaries is unjustified. For fc <= 3, in a 
noiseless setting, the performance of our algorithm lies close 
to that of the oracle, and outperforms the K-SVD algorithm. 
However, we note that this is not the case for k >= 4. 

Finally, we wish to evaluate the contribution of the SAC 
algorithm to the overall framework. One could possibly fix an 
initial block structure and then iteratively update the dictionary 
using BK-SVD, in hope that this will recover the block struc- 
ture. Fig.|5]shows that the representation error e is much lower 
when including SAC in the overall framework. Moreover, BK- 
SVD consistently fails in recovering the dictionary blocks. 



D. Choosing the maximal block size 

We now consider the problem of setting the maximal block 
size in the dictionary, when all we are given is that the sizes of 
the blocks are in the range [si Sh]- This also includes the case 
of varying block sizes. Choosing the maximal block size s to 
be equal to si will not allow to successfully reconstruct blocks 
containing more than s/ atoms. On the other hand, setting 
s = Sh will cause the initial sparse representation matrix 8, 
obtained by the OMP algorithm, to contain too many non- 
zero coefficients. This is experienced as noise by the SAC 
algorithm, and may prevent it from functioning properly. It is 
therefore favorable to use OMP with fc x s; non-zero entries 
only, and setting the maximal block size s to he Sh- 

In Fig. |6(a)[ we evaluate the ability of our block sparsifying 
dictionary design algorithm to recover the optimal dictionary, 
which contains 12 blocks of size 3, and 12 blocks of size 2. As 
expected, better results are obtained when choosing s; = 2. In 
Fig. |6(b)[ the underlying block subspaces are all of dimension 
2, but Sfi is erroneously set to be 3. We see that when si = 2, 
we succeed in recovering a considerable part of the blocks, 
even though blocks of size 3 are allowed. In both simulations, 
K-SVD uses fc x s^ non-zero entries, which explains why it 
is not significantly outperformed by our algorithm in terms of 
representation error Moreover, the percentage of reconstructed 
blocks by our algorithm is relatively low compared to the 
previous simulations, due to the small block sizes. 

V. Conclusions 

In this paper, we proposed a framework for the design of 
a block-sparsifying dictionary given a set of signals and a 
maximal block size. The algorithm consists of two steps: a 
block structure update step (SAC) and a dictionary update step 
(BK-SVD). When the maximal block size is chosen to be 1, 
the algorithm reduces to K-SVD. 

We have shown via experiments that the block structure 
update step (SAC) provides a significant contribution to the 
dictionary recovery results. We have further shown that for 
s > 1 the BK-SVD dictionary update step is superior to the 
K-SVD dictionary update. Moreover, the representation error 
obtained by our dictionary design method lies very close to the 
lower bound (the oracle) for all noise levels. This suggests that 
our algorithm has reached its goal in providing dictionaries 
that lead to accurate sparse representations for a given set of 
signals. 

To further improve the proposed approach one could try and 
make the dictionary design algorithm less susceptible to local 
minimum traps. Another refinement could be replacing blocks 
in the dictionary that contribute little to the sparse representa- 
tions (i.e. "unpopular blocks") with the least represented signal 
elements. This is expected to only improve reconstruction 
results. Finally, we may replace the time-efficient BOMP 
algorithm, with other block-sparse approximation methods. We 
leave these issues for future research. 
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Fig. 6. Simulation results of our overall algorithm (BK-SVD+SAC) and 
the K-SVD algorithm, with maximal block size S/-^ = 3. The graphs show 
the reconstruction error e and the recovery percentage p as a function of the 
number of iterations, (a) contains 12 blocks of size 2 and 12 block of size 
3. (b) contains 30 blocks of size 2. 
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